Data Description


Speech is now common in daily interactions with our devices, thanks to voice user interfaces (VUIs) like Alexa. Despite their seeming ubiquity, designs often do not match users’ expectations. Science fiction, which is known to influence design of new technologies, has included VUIs for decades. Star Trek: The Next Generation is a prime example of how people envisioned ideal VUIs. Understanding how current VUIs live up to Star Trek’s utopian technologies reveals mismatches between current designs and user expectations, as informed by popular fiction. Combining conversational analysis and VUI user analysis, we study voice interactions with the Enterprise’s computer and compare them to current interactions. Independent of futuristic computing power, we find key design-based differences: Star Trek interactions are brief and functional, not conversational, they are highly multimodal and context-driven, and there is often no spoken computer response. From this, we suggest paths to better align VUIs with user expectations.

Data source: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-08-17/readme.md

Chart 1: Lauren

Chart 2: Kim

Verbal vs. Non-Verbal Computer Responses per Primary Types of Voice Interactions


When we interact with voice-command technology, we use certain types of interactions to ‘wake’ the system (“Hey Siri…”), ‘command’ the system (“Play a song on Spotify”), ‘question’ the system (“What is the temperature for today?”), and many other types of interactions.

These interaction types can exist in a chain, such as “Hey Siri, Play a song on Spotify”. However, the primary type of interaction in this phrase is the command to have Siri play a song on Spotify.

On the Starship Enterprise, the crew interacts with the Computer through different primary ‘Interaction Types’. Definitions and examples of these interaction types can be found below.

Interaction Type Definition Examples
Command Utterances that directly tell the computer what to do. Run a diagnostic on the port nacelle.
Question Utterances that ask the computer for something. Where is Captain Picard?
Statement Utterances tell don’t tell the computer or ask it, but meaning is inferred. Deck four. I wish to learn about Earth.
Password Utterances that contain a password. This is Captain Picard.
Wake Word Key phrases used to activate the computer. Computer. Holodeck.
Comment Utterances that have no intended action for the computer. Excellent. Ferrazene has a complex molecular structure.
Conversation Utterances that are more like human conversation, such as phatic espressions, formalities, and colloquial speech. Well, check it again! Then run it for us, dear.

Because the Computer on the Starship Enterprise can generate objects and display information without responding, it is of interest to examine the proportion of occurrences when the computer responds verbally or non-verbally (which includes through actions only).

The visualizations to the left shows the proportion of verbal versus non-verbal responses, according to interaction type by person. This information can help us understand what types of interactions are more likely to result in verbal or non-verbal responses from the Starship Enterprise Computer.

Via the data visualizations created from proportion tests, we can see that Wake Word, Question, Conversation, and Password interactions are most likely to result in a Verbal response from the Computer, and Statement, Command, Comment interactions were found to result in either Verbal or Non-Verbal Computer response fairly equally. One limitation of this analysis is that sample size for certain combinations of interactions and responses are low.

Data source: http://www.speechinteraction.org/TNG/TeaEarlGreyHotDatasetCodeBook.pdf

Chart 4: Stacey

How common is each word?


When looking at text, something that may come up is how common our choice of words can be. A great way to visualize this idea is with word clouds! A bundle of words with varying size, related to how often that word was used.

This image was created using the spoken lines from all of the characters and each word was individually counted. Interestingly, “program” appears to be the most common word with 193 uses, however, the most used word was “computer” with 1036 uses. Wouldn’t be much of a word cloud when a single word is the cloud. By removing the extreme outlier we were able to make a beautiful image that visualizes the Star Trek speech.

---
title: "Tea, Earl Grey, Hot: Designing Speech Interactions from the Imagined Ideal of Star Trek"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
    theme: spacelab
---

```{r setup, include=FALSE}
library(flexdashboard)
library(readr)
library(knitr)
library(tidyverse)
library(purrr)
library(broom)
library(plotly)
startrek <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-17/computer.csv')

```


### Data Description

```{r}
include_graphics('https://raw.githubusercontent.com/LaurS12/ERHS535_Group_Project/main/Images/data_description.png')

#note: this looks like garbage in the markdown file, but if you knit, it shows up correct.
```

***

Speech is now common in daily interactions with our devices, thanks to voice user interfaces (VUIs) like Alexa. Despite their seeming ubiquity, designs often do not match users’ expectations. Science fiction, which is known to influence design of new technologies, has included VUIs for decades. Star Trek: The Next Generation is a prime example of how people envisioned ideal VUIs. Understanding how current VUIs live up to Star Trek’s utopian technologies reveals mismatches between current designs and user expectations, as informed by popular fiction. Combining conversational analysis and VUI user analysis, we study voice interactions with the Enterprise’s computer and compare them to current interactions. Independent of futuristic computing power, we find key design-based differences: Star Trek interactions are brief and functional, not conversational, they are highly multimodal and context-driven, and there is often no spoken computer response. From this, we suggest paths to better align VUIs with user expectations.

Data source: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-08-17/readme.md


### Chart 1: Lauren

```{r}

```

### Chart 2: Kim

```{r}

```

### Verbal vs. Non-Verbal Computer Responses per Primary Types of Voice Interactions

```{r, results='hide'}
no_comp_voice <- startrek %>%
  filter(char != "Computer Voice") %>% 
  filter(char != "Computer") %>% 
  filter(char != "Computer (V.O.)") %>% 
  filter(char != "Computer (V.O)") %>% 
  filter(char != "Computer Voice (V.O.)") %>% 
  filter(char != "New Computer Voice") %>% 
  filter(char != "Com Panel (V.O.)") %>% 
  filter(char != "Computer'S Voice") %>% 
  filter(char != "Computer (Voice)") %>% 
  filter(char != "Computer Voice (Cont'D)")

no_comp_voice <- no_comp_voice %>% 
  select('pri_type', 'nv_resp')

no_comp_voice$nv_resp <- as.factor(no_comp_voice$nv_resp)
no_comp_voice$pri_type <- as.factor(no_comp_voice$pri_type)

no_comp_voice$nv_resp <- no_comp_voice$nv_resp %>% 
  recode_factor("TRUE" = "Non-Verbal Response") %>% 
  recode_factor("FALSE" = "Verbal Response")

levels(no_comp_voice$pri_type)

no_comp_voice <- no_comp_voice %>% 
  group_by(pri_type, nv_resp) %>% 
  tally()

no_comp_voice <- no_comp_voice %>% 
  pivot_wider(names_from = nv_resp, values_from = n)

no_comp_voice[is.na(no_comp_voice)] = 0

no_comp_voice <- no_comp_voice %>% 
  rename(n_verbal = "Verbal Response") %>% 
  rename(n_non_verbal = "Non-Verbal Response")

no_comp_voice$total_resp <- no_comp_voice$n_verbal + no_comp_voice$n_non_verbal

no_comp_voice

prop_verbal <- no_comp_voice %>% 
  mutate(prop_test = purrr::map2(.x= n_verbal,
                                 .y= total_resp,
                                 .f= prop.test))


prop_verbal <- prop_verbal %>% 
  mutate(prop_tidy = purrr::map(prop_test, ~tidy(.x)))

prop_non_verbal <- no_comp_voice %>% 
  mutate(prop_test = purrr::map2(.x= n_non_verbal,
                                 .y= total_resp,
                                 .f= prop.test))


prop_non_verbal <- prop_non_verbal %>% 
  mutate(prop_tidy = purrr::map(prop_test, ~tidy(.x)))

prop_verbal <- prop_verbal%>% 
  unnest(prop_tidy)

prop_non_verbal <- prop_non_verbal%>% 
  unnest(prop_tidy)

prop_verbal <- prop_verbal %>% 
  select(-prop_test)

prop_non_verbal <- prop_non_verbal %>% 
  select(-prop_test)

prop_verbal <- prop_verbal %>% 
  select(pri_type, estimate, conf.low, conf.high, n_verbal) 

prop_non_verbal <- prop_non_verbal %>% 
  select(pri_type, estimate, conf.low, conf.high, n_non_verbal) 

prop_verbal <- prop_verbal %>% 
  mutate(estimate = as.numeric(estimate),
         conf.low = as.numeric(conf.low),
         conf.high = as.numeric(conf.high))

prop_non_verbal <- prop_non_verbal %>% 
  mutate(estimate = as.numeric(estimate),
         conf.low = as.numeric(conf.low),
         conf.high = as.numeric(conf.high))

prop_verbal <- prop_verbal %>% 
  arrange(desc(estimate))

prop_non_verbal <- prop_non_verbal %>% 
  arrange(desc(estimate))

prop_verbal$resp <- "Verbal"
prop_non_verbal$resp <- "Non-Verbal"

resp_per_int <- rbind(prop_verbal, prop_non_verbal)

resp_per_int[is.na(resp_per_int)] = 0

resp_per_int$n <- resp_per_int$n_non_verbal + resp_per_int$n_verbal

resp_per_int <- resp_per_int %>% 
  select(-n_non_verbal) %>% 
  select(-n_verbal)

resp_per_int$resp <- as.factor(resp_per_int$resp)

resp_per_int$pri_type <- factor(resp_per_int$pri_type, levels = c("Password", "Conversation", "Question", "Wake Word", "Comment", "Command", "Statement"))
```

```{r, include=FALSE}
chart_3 <- resp_per_int %>%
  ungroup() %>% 
  ggplot(aes(label=conf.low, 
             label2=conf.high,
             label3=n))+
  geom_col(aes(x=estimate, y=pri_type, fill=resp), position="fill")+
  labs(title= "Proportions of Computer Response Type",
       y= "Person Interaction Type",
       x= "Percent of Responses",
       subtitle = "Bars show 95% confidence interval",
       fill = "")+
  scale_x_continuous(labels = scales::percent)+
  scale_fill_brewer(palette = "Paired")
  theme(plot.title = element_text(hjust = -0.45, vjust=2.12))+
  theme_bw()
```

```{r}
ggplotly(chart_3, height=400, width=800) 
```


***

When we interact with voice-command technology, we use certain types of interactions to 'wake' the system ("Hey Siri..."), 'command' the system ("Play a song on Spotify"), 'question' the system ("What is the temperature for today?"), and many other types of interactions. 

These interaction types can exist in a chain, such as "Hey Siri, Play a song on Spotify". However, the primary type of interaction in this phrase is the command to have Siri play a song on Spotify. 

On the Starship Enterprise, the crew interacts with the Computer through different primary 'Interaction Types'. Definitions and examples of these interaction types can be found below. 

| Interaction Type | Definition                                                                                                        | Examples                                                |
|------------------|-------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| Command          | Utterances that directly tell the computer what to do.                                                            | Run a diagnostic on the port nacelle.                   |
| Question         | Utterances that ask the computer for something.                                                                   | Where is Captain Picard?                                |
| Statement        | Utterances tell don't tell the computer or ask it, but meaning is inferred.                                       | Deck four. I wish to learn about Earth.                 |
| Password         | Utterances that contain a password.                                                                               | This is Captain Picard.                                 |
| Wake Word        | Key phrases used to activate the computer.                                                                        | Computer. Holodeck.                                     |
| Comment          | Utterances that have no intended action for the computer.                                                         | Excellent. Ferrazene has a complex molecular structure. |
| Conversation     | Utterances that are more like human conversation, such as phatic espressions, formalities, and colloquial speech. | Well, check it again! Then run it for us, dear.         |

Because the Computer on the Starship Enterprise can generate objects and display information without responding, it is of interest to examine the proportion of occurrences when the computer responds verbally or non-verbally (which includes through actions only). 

The visualizations to the left shows the proportion of verbal versus non-verbal responses, according to interaction type by person. This information can help us understand what types of interactions are more likely to result in verbal or non-verbal responses from the Starship Enterprise Computer. 

Via the data visualizations created from proportion tests, we can see that Wake Word, Question, Conversation, and Password interactions are most likely to result in a Verbal response from the Computer, and Statement, Command, Comment interactions were found to result in either Verbal or Non-Verbal Computer response fairly equally. One limitation of this analysis is that sample size for certain combinations of interactions and responses are low.

Data source: http://www.speechinteraction.org/TNG/TeaEarlGreyHotDatasetCodeBook.pdf

### Chart 4: Stacey

```{r}

```

### How common is each word?

```{r}
# Packages
library(wordcloud)
library(RColorBrewer)
library(wordcloud2)
library(tm)
library(tidyverse)

# Filter to necessary column
text <- startrek$interaction

# Clean text
docs <- Corpus(VectorSource(text))

docs <- docs %>%
  tm_map(removeNumbers) %>%
  tm_map(removePunctuation) %>%
  tm_map(stripWhitespace)

docs <- tm_map(docs, content_transformer(tolower))

docs <- tm_map(docs, removeWords, stopwords("english"))

# Create matrix with counts
dtm <- TermDocumentMatrix(docs)

matrix <- as.matrix(dtm) 

words <- sort(rowSums(matrix),decreasing = TRUE) 

df <- data.frame(word = names(words), freq = words)

# Wordcloud
wordcloud2(data = df, size = 2, color= "random-light", shape = "circle", backgroundColor = "black")

```

*** 

When looking at text, something that may come up is how common our choice of words can be. A great way to visualize this idea is with word clouds! A bundle of words with varying size, related to how often that word was used. 

This image was created using the spoken lines from all of the characters and each word was individually counted. Interestingly, "program" appears to be the most common word with 193 uses, however, the most used word was "computer"  with 1036 uses. Wouldn't be much of a word cloud when a single word is the cloud. By removing the extreme outlier we were able to make a beautiful image that visualizes the Star Trek speech.